REMARKS 

Claims 42, 43 and 45 were pending in the case. Upon entry of this 
Amendment, Claim 45 will be cancelled, and Claims 42 and 43 will remain pending 
in the case. 

I. The Sequence Listing 

The Office Action objects to the Sequence Listing provided with the case. It 
alleges that at page 14, lines 18-19 nucleic acid sequences are included which are in 
excess of 10 bases, but which lack any Seq. ID. No. It adds that the sequences are 
similar, but not identical to Seq. ID. Nos. 3 and 4, but differ at page 14 in that they 
comprise inosine residues. The Office Action states that the sequences must be 
disclosed in the Sequence Listing, and that Applicants must provide a new sequence 
listing in paper and computer readable form and a statement that the two are the 
same and that no new matter is introduced. 

In response, Applicants provide a new Sequence Listing that Applicants 
believe will overcome the rejections raised by the Office Action. It is provided in both 
paper and computer readable form, and the requested Statement of identity and no 
new matter is also included herewith. Applicants believe the foregoing will overcome 
the Office Action's objection to the Sequence Listing. 

II. The Objections to Claims 42 and 43 

The Office Action objects to Claim 42, alleging that as amended, it is drawn to 
non-elected subject matter. The Office Action alleges that Applicants elected 
methods of screening compounds which alter the conductive properties of 
acetylcholine receptors. The Office Action adds that Claim 42 was amended to 
encompass methods of screening compounds that alter at least one property of an 
acetylcholine receptor, and therefore embraces receptor properties other than 
conductivity, and is therefore alleged by the Office Action to be beyond the scope of 
the elected invention. 

The Office Action objects to Claim 43, alleging that by its amendment it 
therefore embraces non-elected subject matter, but does not specify with 
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particularity how Claim 43, as previously amended, extends beyond the scope of the 
elected material. 

By way of response, Applicants have amended Claim 42 to limit it to the 
conductive property as requested by the Office Action. 

Applicants are not sure why Claim 43 was believed by the Office Action to 
encompass non-elected subject matter, but respectfully assert that Claim 43 as 
amended by this Amendment is directed to elected subject matter. Applicants would 
appreciate discussing Claim 43 with the Examiner in a telephone conversation, if 
Claim 43 as amended herein is not deemed allowable, before the Examiner issues 
another Office Action with respect to Claim 43. 

Applicants believe the objections to Claims 42 and 43 have been overcome, 
and review and reconsideration of the claims are respectfully requested. 

III. The Section 112, First Paragraph Rejections - Written Description 

Claims 42, 43 and 45 stand rejected under 35 U.S.C. Section 112, first 
paragraph, for the reasons of record in paper number 25. Applicants were unable to 
locate a paper no. 25, and respectfully request the Examiner to identify paper 
number 25 with greater particularity. 

The Office Action goes on to state that Claims 42 and 45 are drawn to 
methods of screening compounds with alter the conductive properties of 
acetylcholine receptors, and that Claim 43 is drawn to a method of identifying 
compounds that bind to an acetylcholine receptor. The Office Action adds that the 
methods employ nucleic acids from the group listed on page 5 of the Office Action, 
and that embodiments of Claim 45 employ polypeptides encoded by the nucleic 
acids set forth above. 

The Office Action adds that the first of these groups of nucleic acids 
(Applicants assume the Office Action refers to nucleic acids consisting of Seq. ID. 
No. 1 ) which are fully disclosed and adequately described, which Applicants 
acknowledge with appreciation. 

The Office Action adds that similarly, the polypeptide of Seq. ID. No. 2 is fully 
disclosed and adequately described, which Applicants also acknowledge with 
appreciation. 
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The Office Action alleges that the remaining groups of sequences represent 
genuses, 

The Office Action alleges that the genus in not enabled, alleging, in summary, 
that there is only a disclosure of a single member (e.g. SEQ ID NO.:2) of the claimed 
genuses, nor is there any relevant identifying characteristic described, such as a 
correlation between a specific structure and the required function. 

By way of response, Applicants have cancelled Claim 45, and have amended 
Claim 42 to limit it to the conductive property and have otherwise amended Claims 
42 and 43 to address the issues raised by this section of the Office Action. 

Applicants believe amended Claims 42 and 43 overcome these 
objections/rejections and are in condition for allowance. Review and reconsideration 
of Claims 42 and 43 as amended herein is respectfully requested. 

IV. The Section 112, First Paragraph Rejections - Enablement 

Claims 42, 43 and 45 stand rejected under 35 U.S.C. Section 112, first 
paragraph, on the basis that the specification, while enabling for methods of 
screening compounds which alter the conductive properties of acetylcholine 
receptors, or which bind to acetylcholine receptors as listed in the Office Action, 
allegedly does not reasonably provide enablement for such methods: 

that employ subsequences of SEQ ID NO: 1 that are only 14 bases in length, 

or 

that employ nucleic acid sequences encoding polypeptides that differ from the 
entire sequence of SEQ ID NO:2 or from subsequences of SEQ ID NO:2, or 

for methods that employ variants of SEQ ID NO:2 or variants of its 
subsequences. 

The Office Action alleges that the specification does not enable compounds 
useful for either or both of crop protection and pharmaceutical treatment of humans. 
The Office Action alleges that the specification does not enable any person skilled in 
the art to which it pertains, or to with which it is most closely connected, to make the 
invention commensurate in scope with these claims. 
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The Office Action then provides a discussion of SEQ. ID NO:1, and concludes 
that it is unrealistic to assume that a nucleic acid of only 14 nucleotides could encode 
a polypeptide with the claimed function, nor could its complement. 

The Office Action then discusses nucleic acids that encode variants of SEQ 
ID NO:2 or its subsequences, concluding that the specification is not enabling 
because it does not teach which variants will provide a functional beta subunit and 
which will not. The Office Action, in summary, concludes that determining this will 
require undue experimentation. 

The Office Action also alleges the specification fails to provide an enabling 
disclosure because it fails to teach what is useful for crop protection and/or 
pharmaceutical treatment of humans. 

The Office Action also alleges that the specification fails to provide an 
enabling disclosure because essential elements of the invention are incorporated by 
reference to a non-patent publication in its reference to the use of GCG program 
GAP, Version 10.0 using "standard settings". 

Claims 42, 43 and 45 are rejected under 35 U.S.C. Section 112, second 
paragraph, with the Office Action alleging that the claims are indefinite because they 
recite "the biological function of an acetylcholine receptor" without antecedent basis. 
The Office Action adds that acetylcholine receptors can be viewed as having several 
biological functions, concluding that the claims fail to stipulate what biological 
function is intended. 

The Office Action also alleges that Claims 42, 43 and 45 are unclear in that 
the metes and bounds of the invention are unclear. It adds that Claims 42, 43 and 
45 are indefinite because they recite "the sequence between position 43 and 1368 of 
SEQ ID NO:1" without antecedent basis. The Office Action adds that there are many 
sequences between positions 43 and 1368 of SEQ ID NO:1 , and the Office Action 
alleges that it is unclear to which sequence Applicants refer. The Office Action most 
helpfully adds that substitution of the phrase "the sequence comprising positions 
43-1368 of SEQ ID NO:1" for the phrase "the sequence between position 43 and 
1368 of SEQ ID NO:1" would be acceptable. 
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Claims 42, 43 and 45 are rejected under 35 U.S.C. Section 112, second 
paragraph as indefinite on the basis that it is allegedly unclear what is intended by a 
"70% identity" or a "40% identity". The Office Action adds that the specification 
teaches that preferably identity is calculated using of GCG program GAP, version 
10.0, using the "standard settings". The Office Action adds that one of skill in the art 
would appreciate that there are other available programs which may be used, and 
that there are a variety of user-chosen parameters which may be varied, and that the 
calculated identity will vary with the program and parameter values chosen. 
Because the claims fail to set forth the program and parameter values used in the 
calculation, the Office Action alleges that one cannot know what is intended by "70% 
identical" or "40% identical". 

By way of response, Applicants have cancelled Claim 45 and provide newly 
amended Claims 42 and 43 which Applicants believe overcome each of the grounds 
asserted for rejection above. 

Applicants add that with respect to the nucleic acids that encode variants of 
SEQ ID NO:2, and the allegation that the specification fails to teach which variants 
will provide a functional beta subunit and which will not, Claims 42 and 43 as 
amended state that the sequence can be selected from a group which includes 
sequences which encode a polypeptide having an amino acid sequence which has 
at least 40% identity to the amino acid sequence as set forth in SEQ. ID NO:2 over 
its entire length. That this sufficiently defines the invention may be supported by an 
article such as Peer, Mittl., "Protein Structure Prediction", Biochemisches Institut, 
Iniversitat Zurich (copy enclosed as Attachment A). From the 10th and 1 1th page of 
this reference, it can be inferred that proteins are presumed to have a similar 
structure if the amino acid sequence identity is >25% over an alignment of at least 80 
amino acids. Of curse, proteins having a similar structure would be expected to 
have a similar function. 

Applicants add that with regard to the allegation in the Office Action that 
because the claims fail to set forth the program and parameter values used in the 
GCG program GAP, version 10.0, calculation of identity, the Office Action alleges 
that one cannot know what is intended by "70% identical" or "40% identical", 
Applicants respectfully traverse and assert this is not well founded. Applicants 
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respectfully assert that the present invention's specification is much more clear than 
that of issued patents that are not nearly as precise. First, those skilled in the art 
would understand that most of the user settable parameters of the software do not 
affect the calculation of identity. Second, those skilled in the art would understand 
that the software is provided with "presettings". This is meant to be standard settings 
that arrive with the software. Therefore, using "standard settings" in line with the 
specification cannot be interpreted to interpreted to mean that the percentage of 
identity is changed by setting user-adjustable parameters. Those skilled in the art 
would understand what is meant by the term "standard settings" and those skilled in 
the art would all use the same standard settings to arrive at the same percentage of 
identity. 

Applicants submit that the instant application is in condition for allowance. 
Accordingly, early examination and a Notice of Allowance are respectfully requested 
for Claims 42 and 43. If the Examiner is of the opinion that the instant application is 
in condition for other than allowance, he is requested to contact the applicants' 
Attorney at the telephone number given below so that additional changes may be 
discussed. 



Bayer Corporation 
100 Bayer Road 

Pittsburgh, Pennsylvania 15205-9741 
PHONE: (412)777-8366 
FACSIMILE PHONE NUMBER: 
412-777-8363 
s/rmc/rjh/0097 



Respectfully submitted, 




Reg. No. 33,896 
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VERSION WITH MARKINGS TO SHOW CHANGES MADE 



IN THE CLAIMS; 

Please cancel Claim 45, and amend Claims 42 and 43 as follows: 

42. ( Twice Amended) A method of determining a compound which alters at 
le ast one the conductive property of an acetylcholine recepto r said acetylcholine 
receptor comprising a polypeptide encoded by a nucleic acid town gcomprisinq a 
sequence selected from the group consisting of a sequence as set forth in SEQ ID 
NO: 1, subs e qu e nc e s of SEQ I D NO: 1 wh i ch ar e at le ast 1 4 bas e pa i rs i n l e ngth, 
s e qu e nc e s wh i ch hybr i d i z e w i th SEQ ID NO: 1, s e qu e nc e s wh i ch hav e at le ast 70% 
id e nt i ty to th e s e qu e nc e b e tw ee n pos i t i on 43 and pos i t i on 1368 of SEQ ID NO: 1, 
s e quenc e s which ar e comp le m e ntary to SEQ I D NO: 1, and sequences which, owing 
to the degeneracy of the genetic code, encode the same amino acid sequence as do 
th e s e qu e nc e s d e f i n e d abov e does the sequence as set forth in SEQ ID NO:1 . er 
a l t e rs at le ast on e prop e rty of a po l yp e pt i d e e x e rt i ng th e bio l ogica l funct i on of an 
ac e tylcho li n e r e c e ptor 6 subun i t and compris i ng an am i no acid s e qu e nc e hav i ng at 
le ast 4 0% i d e ntity to SEQ I D NO: 2, th e compound us e fu l for crop prot e ct i on and/or 
pharmac e ut i ca l tr e atm e nt of humans, and sequences which encode a polypeptide 
having an amino acid sequence which has at least 40% identity to an amino acid 
sequence as set forth in SEQ ID NO:2 over its entire length 

the method comprising: 

culturing in the presence of the at least one compound a host cell stably transfected 
or transformed with a nucleic acid comprising a sequence selected from the 
group consisting of a sequence as set forth in SEQ ID NO: 1, subs e qu e nc e s 
of SEQ I D NO: 1 which are at le ast 1 4 base pa i rs i n le ngth, s e qu e nc e s wh i ch 
hybr i d i z e w i th SEQ I D NO: 1, s e qu e nc e s wh i ch hav e at le ast 70% id e nt i ty to 
th e s e qu e nc e b e tw ee n pos i t i on 4 3 and pos i t i on 1368 of SEQ ID NO: 1, 
sequ e nc e s wh i ch ar e comp le m e ntary to SEQ I D NO: 1, and . sequences 
which, owing to the degeneracy of the genetic code, encode the same amino 
acid sequence as do th e s e qu e nc e s de f ine d abov e or a v e ctor compr i s i ng an 
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i so l ated and pur i fi e d nucl ei c acid molecu le as d e f i n e d abov e does the 
sequence as set forth in SEQ ID NO:1 , and sequences which encode a 
polypeptide having an amino acid sequence which has at least 40% identity to 
the amino acid sequence as set forth in SEQ ID NO: 2 over the entire length , 
and 

detecting the at l e ast on e altered conductive property of the receptor. 

43. ( Twice Amended) A method of determining a compound specifically 
binding to an acetylcholine receptor which compound upon binding alters the 
conductive property of the acetylcholine receptor, said acetylcholine receptor 
comprising a polypeptide encoded by a nucleic acid comprising a sequence selected 
from the group consisting of a seguence as set forth in SEQ ID NO: 1, 
subs e quenc e s of SEQ ID NO: 1 which ar e at l e ast 1 4 baso pairs i n l ength, 
s e qu e nc e s which hybr i d i z e w i th SEQ ID NO: 1, s e qu e nc e s which hav e at le ast 70% 
i d e nt i ty to th e s e quenc e b e tw ee n pos i t i on 4 3 and pos i tion 1368 of SEQ I D NO: 1 , 
s e qu e nc e s which ar e comp le m e ntary to SEQ ID NO: 1, a nd sequences which, owing 
to the degeneracy of the genetic code, encode the same amino acid sequence as-de 
th e s e qu e nc e s d e fin e d abov e , or a po l yp e ptid e e x e rt i ng th e b i o l og i ca l funct i on of an 
ac e ty l cholin e r e c e ptor R subun i t and compr i s i ng an am i no ac i d s e qu e nce having at 
l e ast 40% i d e ntity to SEQ I D NO:2 does the seguence as set forth in SEQ ID NO:1. 
and sequences which encode a polypeptide having an amino acid seguence which 
has at least 40% identity to the amino acid seguence as set forth in SEQ ID NO: 2 
over the entire length , 

the method comprising: 

exposing a host cell stably transfected or transformed with a nucleic acid comprising 
a sequence selected from the group consisting of a seguence as set forth in 
SEQ ID NO: 1, subs e qu e nc e s of SEQ I D NO: 1 wh i ch ar e at le ast 14 bas e 
pa i rs i n le ngth, s e qu e nc e s which hybr i diz e w i th SEQ I D NO: 1, sequ e nc e s 
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wh i ch hav e at le ast 70% id e nt i ty to th e sequenc e b e tw o on posit i on 43 and 
pos i tion 1368 of SEQ I D NO: 1, s e qu e nc e s wh i ch ar e comp le m e ntary to SEQ 
I D NO: 1, and _sequences which, owing to the degeneracy of the genetic 
code, encode the same amino acid sequence as do th e sequenc e s d e fin e d 
abov e or a v e ctor compr i s i ng an i so l at e d and pur i f ie d nuc lei c acid mo l ocu l o as 
d e f i n e d abov e does the sequence as set forth in SEQ ID NO: 1 , and 
sequences which encode a polypeptide having an amino acid sequence 
which has at least 40% identity to the amino acid sequence as set forth in 
SEQ. ID NO: 2 over the entire length , 

or 

exposing a polypeptide encoded by a nucleic acid comprising a sequence selected 
from the group consisting of a sequence as set forth in SEQ ID NO: 1, 
subs e qu e nc e s of SEQ I D NO: 1 wh i ch ar e at le ast 1 4 bas e pa i rs i n le ngth, 
s e qu e nc e s wh i ch hybr i d i z e w i th SEQ ID NO: 1, s e qu e nc e s wh i ch hav e at 
le ast 70% id e ntity to th e s e qu e nc e b e tw ee n pos i tion 4 3 and pos i t i on 1368 of 
SEQ I D NO: 1, s e qu e nc e s wh i ch are comp le m e ntary to SEQ I D NO: 1, and 
sequences which, owing to the degeneracy of the genetic code, encode the 
same amino acid sequence as do th e s e qu e nc e s d e fin e d abov e or a 
po l yp e pt i d e e x e rt i ng th e b i o l og i ca l function of an ac e ty l cho l in e r e c e ptor & 
subun i t and comprising an am i no acid s e qu e nc e hav i ng at le ast 40% i d e nt i ty 
to SEQ I D NO: 2 does the sequence as set forth in SEQ ID NO: 1 , and 
sequences which encode a polypeptide having an amino acid sequence 
which has at least 40% identity to the amino acid sequence as set forth in 
SEQ. ID NO: 2 over the entire length, , 

or 

exposing an acetylcholine receptor comprising a polypeptide encoded by a nucleic 
acid comprising a sequence selected from the group consisting of a sequence 
as set forth in SEQ ID NO: 1 , subs e qu e nc e s of SEQ I D NO: 1 wh i ch ar e at 
l e ast 1 4 bas e pa i rs i n le ngth, s e qu e nc e s wh i ch hybr i diz e w i th SEQ I D NO: 1, 
s e qu e nc e s wh i ch hav e at le ast 70% i d e nt i ty to th e s e qu e nc e b e tw ee n posit i on 
4 3 and pos i t i on 1368 of SEQ I D NO: 1, s e quences wh i ch ar e complementary 
to SEQ I D NO: 1, and _ sequences which, owing to the degeneracy of the 
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genetic code, encode the same amino acid sequence as do th e s e qu e nc e s 
d e f i n e d abov e or an ac e tylcho l in e r e c e ptor compr i s i ng a po l yp e pt i d e e x e rt i ng 
th e b i o l og i ca l function of an ac e tylcho li n e r e c e ptor 6 subunit and comprising 
an am i no acid s e qu e nc e hav i ng at le ast 40% i d e ntity to SEQ I D NO: 2 does 
the sequence as set forth in SEQ ID NO: 1, and sequences which encode a 
polypeptide having an amino acid sequence which has at least 40% identity to 
the amino acid sequence as set forth in SEQ. ID NO: 2 over the entire length , 

to at least one compound under a t le ast on e conditions permitting the interaction of 
the at least one compound with the host cell, the polypeptide or the receptor, 
and •, 

identifying the compound specifically binding to the receptor. 
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ATTACHMENT A 



RECEIVED 

FEB 2 0 2003 

TECH CENTER 1600/2900 

Protein structure prediction 

Peer Mittl, Dr. 

Biochemisches Institute Universitat Zurich 
Tel: 01-635 6559 
e-mail : mittl@bioc .unizh. ch 




Overview 

1 . Introduction 

• Protein structure prediction - why do we need this? 

• Principles of protein structure. 

- the conformational space 

- hierarchical organisation of protein structures 

• Sequence/structure relationship 

• Protein structure databanks 

2. Disciplines of structure prediction 

• Secondary structure prediction 

- The Chou & Fassman method (examples) 

- Neuronal networks (examples) 

• ab initio three-dimensional structure prediction 

• Threading (fold-recognition methods) 

- 1D-3D profiles 

- knowledge based potentials 

• Homology modelling 



Protein structure prediction - why do we need this? 

1 . Folding & structure prediction are two closely related issues 

native ribonuclease heat ' deduc< y unfolded ribonuclease — oxidise — ► refolded ribonuclease 

100 % activity no activity fully active 

well defined 3D random coil well defined 3D 

structure structure structure 

—> Information for structure is hidden in the amino acid sequence. 

2. Functional aspects: 

- functional details (which residue is involved in ...) 

- function for novel proteins (genome sequencing projects) 

- epitopes for protein/protein interactions (antigen/antibodies) 

- drug binding (resistance against anti-HIV drugs) 



3. Number of experimentally determined 3D-structures lacks behind the number of 
protein sequences. 




E 
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Principles of Protein Structure 

Protein structures are organised hierarchically. 



level of hierarchy 


' elements 


atoms 


H, C, N, O, S (P, Se) 


residues 


20 natural amino acids (Ala, ... Trp), modified amino acids 


secondary structure 


a-helix, P-sheet (parallel/anti-parallel), loops, 
3 10 -helix, collagen helix 


super-secondary structure 


4-helix bundle fold (a-a-a-a), Rossman fold (a-P-oc-p), .... 


domains 


Ig-like domain, NAD-binding domain, .... 


globular proteins 


myoglobin, haemoglobin, actin, .... 


protein aggregates 


homo/hetreo dimers, trimers, .... , viruses, 
filaments, (ribo)somes 



Atoms and residues 

Conformation defined by: 




A, 1 — w , £m\j , ^ i — , low 

± 20° ± 40° 

nitrogen ^ hydrogen unfavourable favourable 



Dihedral angles of the main-chain 



The conformation of the main-chain is restricted: 

1. peptide bond mainly trans (CO = 180°), 

cis conformations (co = 0°) before proline residues 

2. <)>/\(/-angles are restricted due to clashes between 
Cp-atom and main-chain atoms 



<[>/\|/-plots (Ramachandran plot) 



o 

II 

> 




<t> = 0° 
non-Glycine residues 



4> = 0 ( 
Glycine residues 





' allowed region 
' disallowed region 



Secondary structural elements 

Secondary structural elements 

repetitive elements: a-helix, (3-sheet (parallel/anti-parallel), 3 10 -helix ? 

collagen helix 

loops: reverse turns (several types), random coil 

Secondary structural elements are defined by hydrogen bonds 



3 l0 -helix 

O; ... N (W) 



a-helix 
O, ...N (W) 



(71-helix) 
O....N (W) 





Two types of reversed turns 



Super-secondary structure 

• Secondary structural elements are combined to form higher structural 
aggregates (fold, motifs). . 

• Defined by: secondary structural elements involved 

relative orientation between elements 
connectivity between elements 

• Three main classes: all-a, all-P, a(3 

w mi w tin 




12 different all-P folds 



Domains 

• domains consist of super-secondary structural elements 

• residues that are fare apart in the sequence but close together in space form a 
domain (neighborhood relationship) 

• domains are detected by Ca-distance matrix 

• little flexibility within a domain but large movements between domains 



1 st domain 2nd domain 




Globular Proteins 



Domains are modular building blocks for globular proteins 



Epidermal growth factor (EGF) 




Slpl Chymotrypsin 

HSIii Urokinase 

W0' 



Mm Factor IX 
WmM$ Plasminogen 



^pp." EGF-like domain (53 aa) 

Serine protease domain (245 a 
jllfl Kringle domain (85 aa) 
Calcium-binding domain 



Protein aggregates 



• globular proteins aggregate to form higher order oligomers 

• oligomerization is guided by complementary surfaces 
(shape, electrostatic properties, hydrophobic patches, ....) 



Stochiometry 


no sym. 


Symmetry 
rotational sym. 


helical sym. 


A, 


myoglobin 






A 4 




haemoglobin 




A 2 B 2 




caspase-3 




A,B,C, 


ribosome 


picorna virus 




A99999 


amorphous 




actin 



2. Sequence / structure relationship 



Methods of sequence/sequence comparison (lecture 8 & 9, G. Capitani) 
different alignment methods, optimisation of similarity score 



sequence similarity : 
sequence identity : 



substitution matrix, threshold value 
identity matrix, thresholds 



BLOSUM62 



AGPVGPEDVS 
SGYLGPDEAA 




identity matrix 



AGPVGPEDVS 
1631672201 
SGYLGPDEAA 

AGPVGPEDVS 
0100110000 
SGYLGPDEAA 



threshold > 0 



AGPVGPEDVS 



threshold 1 



SGYLGPDEAA 

AGPVGPEDVS 

I II 
SGYLGPDEAA 



90% similarity 



30% identity 



Methods of structure/structure comparison 

1 . ) the root mean square deviation (rmsd) 

2. ) secondary structure identity 



Root mean square deviation (rmsd) 




rmsd = 




n 



for atom pairs i=l to n 

Axj distance between two compared 

atoms in space 



find starting 
superposition 




cyclic minimisation 
of rmsd 




rmsd=1.6 A 



rmsd=1.2 A 



Rigid body movements increase the rmsd 



Protein 1 
(2 domains) 





Protein 2 
(2 domains) 





domain I: rnisd (low) 
domain II: rmsd (high) 




domain I: rmsd (high) 
domain II: rmsd (low) 



Sequence identity and rmsd of Sperm Whale myoglobin 




How does sequence identity correlate with 
structural similarity? 



Analysis by Chotia & Lesk (1986) 

• 100 % sequence identity: 
rmsd is at the order of the 
experimental co-ordinate error 

• < 25 % sequence identity: 
structures might be similar 
(twilight- zone), but they might 
also be different 

• rigid body movement can enhance 
rmsd although structures are similar 




0.0 



100 80 60 40 20 
Percent residue identity 



Sequence / sequence versus structure / structure alignment 

Example: Hirustasin and Decorsin 
are leech derived inhibitors 

25.6 % sequence identity 



Hirustasin 
Decorsin 

Decorsin (sequence) - - APRLPQCQGDDQE KCLCNK DECPPGQCRFPRGDADP- YCE 

|:= . Ihlh • I ■ I = : I II 

Hirustasin TQGNTCGGETCSAAQVCLK GKCVCNEVHCRIRCKYGLKKDENGCEYPCSCAKASQ 

I ■ : I I I 

Decorsin (structure) APRLPQCQGDDQEKCLCNKDECPP-GQCRFPRGDADPYCE 

loo pi loop2 loop 3 




Conclusion: at low sequence identity the sequence/sequence 
alignment might imply a wrong structure/structure alignment. 



Secondary structure identity as a measure for structural identity 



Protein I 




Protein II 



1 . align sequences 

2. assign secondary 
structure to sequence 



1 10 20 30 37 

BBBBBBCCCCBBBBCCCBBBBCCCBBBBCCCBBBBB 
BBBBCCCCBBBBBBCCCBBBBCCCBBBBCCCBBBBC 



(37-5)/ 37 = 0.86 
86 % secondary structure identity 



secondary structure identity > 70% — ► structures similar 



What is the minimal alignment length to deduce 
structural similarity from sequence identity? 

Analysis of Schneider & Sander ( 1 99 1 ) 
Definition: two structures are similar if: 

1. rmsd<2.5 A 

2. secondary structure identity > 70% 




Alignment 
length 


sequence identity 
threshold 


< 10 


79.6 % 


20 


53.9 % 


30 


43.0 % 


40 


36.6 % 


60 


29.1 %. 


>80 


24.8 % 



Conclusion 

• We can deduce structural similarity between two proteins 

if the sequence identity is > 25 % over an alignment length of 
at least 80 amino acids. 

• Protein structures can be similar even at very low sequence 
identity (twilight-zone). 

• Structure is better conserved than sequence. 



Databanks for protein structures 



Brookhaven Protein databank 
(PDB) 

Cambridge Crystallographic 
Data Centre 

B ioMagResBank 

ModBase 



experimentally determined protein structures 
http ://www . rc sb . org/p db/ 

Crystal structures of small molecules 

http://wvvwxcdcxam.ac.uk/ 

NMR protein structures http://www.bmrb.wisc.edu/ 

Database of homology models 
http://pipe.rockefeller.edxi/ 



SCOPE 

CATH 

FSSP 

Relibase 



hierarchical clustering of structures 
http://scop.mrc-lmb.cam.ac.uk/scop/ 

http://www.biochem.ucl.ac.uk/bsm/cath_new 

http://www2.embl-ebi.ac.uk/dali/fssp/fssp.html 

structures of proteins/small molecule complexes 
http://relibase.ebi.ac.uk/ 



Hierarchical Structure Classifications in SCOP 

Murzin, Brenner, Hubbard & Chothia; 1 995 




p-glucanase 



a-amylase 



RuBisCo 



(3-amylase 



P-galactosidase cyclodextrin glyco.trans. 



oligo-1 ,6-glucosidase 



A.niger B.circulans 



2aaa 1cdg 1cgt 



B.stearothermophilus 



1cyg 



B.cereus 



1cbc 



1 . class 



2. fold 



3. superfamily 

4. family 

5. protein 

6. species 

7. depositions 



Structural Classificationof Proteins 



Superfamily: Globin-like 

Lineage: 

1 . Root: scop 

2. Class: All alpha protons 

3. Fold: Globin-like. 

core: 6 helices.fo ldedeaf, pa rtly opened 

4. Superfamily: Globin-like 

Families: 



1. Truncajfiditfi moglo bin (2) I 

2. Globins (52) SEED 

Heme-bindingprotein 

3. Phycocyanins(10)lHl 



Protein Domains: 



1. Hemoglobin I 

1. Ark clam ( Scaoharcaiaeauivalvfi t 10) IPS 

2. Clam (Lucina pectinat a4) BH 

2. Glyceraglobin 

1 . Marine bloodworm (GfyceradibranchiaCB(4) ! 

3. Myoglobin 

1. Sperm whale (Physetercatodop( 131) BH 

2. Sea hare {Aplysia limacin$(7) WW 

3. Common seal ( Phocevitulina\ (1) BB 

4. Pi g ( Su$ scrofa ( \ Ti BH 

5. Horse (fquu5caba//u>(13) fflffl 



oligomersof two differenttypesof homologousubunits 
each subunit contains 2 additional helices at the N-terminus 
bindsa chromophore 



PDB Entry Domains: 



1. Ia6m ! 
com ptexe&i th hempxy, so4 

2. latikBBQOS 
comptexeriith hem$o4 

3. ihzpEHH 
comptexeriith hem$o4 

I. chain a I 



Disciplines of structure prediction 



1 . ) Prediction of secondary structure 

a. method of Chou & Fassman 

b. neural networks 

2. ) Prediction of tertiary structure 

a. ab initio structure prediction 

b. threading 
-1D-3D profiles 

- knowledge based potentials 

c. homology modelling 



< 



predictive 
methods 



modelling 
methods 



The Chou & Fassman method for secondary 
structure prediction 

Chou & Fassman (1974) Biochemistry 13, 21 1-221, 222-245 

1. Probabilities for all amino acids to be either in oc-helix, (3-sheet or coil. 
20 * 3 probabilities 

2. Set knowledge-based rules to apply probability tables for prediction 

protein 1 AGPDFVILKRW. . . 

CCHHHHCCHHH . . . 

protein 2 PGSAALHVIYI . . . 

CCBBBBCCHHH . . . 




protein 15 MVEDEHVILTAGF . . . 

CBBBBBBBBBHHH. . . 







No. of residues 


Amino 


^total 


#he!ix ^shcet #coil 


acid 






Ala 


228 


119 38 71 


Arg 


78 


22 12 44 


Val 


181 


74 51 • 56 



Databank of 15 Assignment of secondary List of observations 

structures structures to sequences 



From observations to probabilities 



Aminoacid # tota j 



observations 

^helix ^shect ^coil 



frequency 
f f 

A helix A shcet 



^coil 



probability 



1 helix 



P P 

x sheet A coil 



Ala 
Arg 



Val 



228 
78 

181 



119 38 71 
22 12 44 



74 



51 56 



119/228 = 0.52 
#Ala . / £Ala = fAla 

77 helix 1 ^ total 1 helix 



0.52 0.97 0.31 
0.28 0.15 0.56 



average 
<f> 



0.41 
0.36 



0.28 
0.17 



0.31 
0.47 



1.45 
0.79 

1.14 
1.0 



0.52 /0.36 = 1.45 

fAla / < f > = pAla 

1 helix 1 A helix r helix 



0.97 0.66 

0.90 1.20 

1.65 0.66 

1.0 1.0 



Knowledge based rules for the prediction of 
secondary structural elements 

1 . Method predicts a-helix, [3-sheet and p-turns 

2. Rules for secondary structure assignment: 

a. Find the nucleation centre for a-helix or [3-sheet 
a-helix: 4 residues out of 6 with P helix > 1.06 

<Phelix>6 >l-03 

[3-sheet: 3 residues out of 5 with P sheet > 1.05 

b. Extend nucleation centre to both sides 
a-helix: stop if <P he ii x > 4 < 1 .00 for 4 residues 
p-sheet: stop if <P sheet > 4 < 1 .00 for 4 residues 

c. Decide between a-helix and p-sheet 
a-helix: <P helix > 4 > <P S heet>4 
P-sheet: <P sheet > 4 > <P helix > 4 



Prediction of p-turns 



P- turns: 



A turn 4 
± turn 4 



for 4 residues > 1.00 

> <Phelix>4 Or <P 



Pturn 
Pturn 



= f ; 



* f 

>0.75 * 



sheet 4 



* f * 

10- 4 



i+3 * 



«+2) 



N-term. 




«i+3) 



C-term. 





fro 


P-turn frequency 
ffi+n m+2^ 


fti+3) 


A 


0.060 


0.076 


0,035 


0.058 


c 


0.149 


0.053 


.0.117 


0.128 


D 


0.147 


0.110 


0.179 


0.081 


E 


0.056 


0.060 


0.077 


0.064 


F 


0.059 


0.041 


0.065 


0.065 


G 


0.102 


0.085 


0.190 


0.152 


H 


0.140 


0.047 


0.093 


0.054 


I 


0.043 


0.034 


0.013 


0.056 


K. 


0.055 


0.115 


0.072 


0.095 


L 


0.061 


0.025 


0.036 


0.070 


M 


0.068 


0.082 


0.014 


0.055 


N 


0.161 




0.191 


0.091 


P 


0.102 


«£30^ 


0.034 


0.068 


Q 


0.074 


0.098 


0.307 


0.098 


R 


0.070 


0.106 


0.099 


0.085 


S 


0.120 


0.139 


0.125 


0.106 


T 


0.086 


0.108 


0.065 


0.079 


V 


0.062 


0.048 


0.028 


0.053 


W 


0.077 


0.013 


0.064 


0.167 


Y 


0.082 


0.065 


0.114 


0.125 



Amino acid preference for a-helices 

tabu iv : Frequency of Helical Boundary and Central Residues* in 1 5 Proteins. 



Aw* 



Ala 

Trp 
Thr 
Gin 
Phe 
A*n 
Ser 

Cy 

Met 

Tyr 

Be 

Val 

Gly 

Lyi<+> 

Leu 



ArgW 



0.140 
0.136 
0.122 
0.116 
0.098 
0.090 
0.079 
0.074 
0.071 
0.070 
0.066 
0.061 
0.060 
0.057 
0.036 



w 



0.054 
0.038 



Glri 
Ar 8 <+) 
Cys 
Met 
Glu'-> 
Ala 
Val 
Phe 
Leu 
Ash 
Sa- 
ne 

Asp'"' 

Tyr 

Thr 

Trp 

Gly 



Ac' 



0.216 
Ot60 



0.158 
0.154 
0.145 
0.143 
0.124 
0.118 
0.116 
0.110 
0.102 
0.090 
0.084 
0.075 
0.054 
0.050 
0.045 
0.045 
0.039 



0.097 



Hifl«+> 
Pro 
So- 
Oty 
Asn 
lie 
Phe 
Gin 
Leu 
Asp<-> 
Glu<-> 
Tyr 
Lys<+> 
Val 
Met 
Trp 
Thr 
Cyj 
Ala 
Arg ( +> 



0.162 
0.141 
0.104 
0.103 
0.098 
0.085 
0.085 
0.084 
0.082 
0.081 
0.080 
0.080 
0.074 
0.072 
0.071 
0.068 
0.064 
0.056 
0.044 
0.038 
0.082 



H| S t+» 

Asp'-' 

Lys (+ > 

Asn 

Arg< + > 

Gly 

lie 

Pro 

Cys 

Thr 

Tyr 

Phe 

Met 

Leu 

Glu ( -» 

Ala 

Gin 

Ser. 

Trp 

Vat 



0.149 
0.135 
0.120 
0.120 
0.115 
0.112 
0.094 
0.094 
0.093 
0.090 
0.090 
0.073 
0.071 
0.061 
0.053 
0.044 
0.042 
0.040 
0.023 
0.022 
0.080 



Ala 

Phe 

Leu 

GIu<-> 

VaJ 

Gin 

Met 

Lys< +) 

Trp 

Aap<-> 

Cy» 

Arg< + > 

Ser 

Thr 

Asn 

lie 

His ( *> 

Tyr 

Gly 



[Pro 



0.184 
0.183 
0.179 
0.177 
0.166 
0.158 
0.143 
0.126 
0.114 
0.099 
0.093 
0.090 
0.084 
0.083 
0.075 
0.075 
0.068 
0.050 
0.034 
~Q I 



0.112 




■ proline residues frequently at N-terminus 5 never at C-terminus 
or inside an a-helix. 

■ negatively charged side-chains at N-terminus. 

■ positively charged side-chains at C-terminus, rarely at 
N-terminus. 



4. Expand nucleation site for fi-sheet 

1.02 

L04 l__ 

1.06 [ 



1.08 J 

r 1 



1 TJ 
1 .J J 






1.37 


i 






i 


1 I A 

1. 10 








1 .45 


1 .51 


1 . 51 


i ' 

|l . 16 


1.21 


1.16 


1 1.16 ! 


0 . 77 


ll . 16 


1 . 08 


1 . 08 


1 . 13 


Met - 


Glu - 


Glu - 


Lys - 


Leu - 


Lys - 


Lys - 


Ser - 


Lys - 


He - 


He - 


Phe 


1 .05 


0 .37 


0 .37 


0 . 74 


1 .30 


0 . 74 


0 . 74 


0 . 75 


|0 . 74 


|l .60 


1 .60 


1.38 
















i 
i 






i 


.1.33 














i 
i 






i 


1.17 




















0.96 




1 . 06 


1. 06 


1 0 . 57 


0 . 57 


0 . 57 


0 . 57 


0 . 77 


0 . 57 


1 . 16 


0 . 57 


0 .83 


1 . 11 


Val - 


- Val - 


■ Gly - 


■ Gly 


- Pro 


- Gly - 


• Ser - 


Gly - 


Lys - 


• Gly - 


Thr - 


Gin 


1 .70 


1 . 70 


1 0 . 75 


0 . 75 


0 .55 


0 . 75 


0 . 75 


0 . 75 


0 . 74 


0 . 75 


1 .19 


1 . 10 



1.23 L__. 
0.97 



6. Expand the (3-turn nucleation sites 



Lys - 



Gly - 



Gin 



0 . 50 



0 . 50 
0.45 



18 
65 
56 
12 
76 
30 



1 . 55 



|0.102 0.085 0.125 0.152 | 

t 0.102 0.139 0.190 0.0951 



1 . 13 



1.56 1.56 1.52 1.56-1 1.43 1.56 |l . 01 1.56 0.96 0.98 
0.102 0.085 0.034 0.152 

[0.102 0.301 0.190 0.106| 



j).120 0.085 0.072 0.152 | 

6.102 0.115 0.190 0.079 1 

0.055 0.085 0.065 0.098 



SEQUENCE LISTING 

<110> Bayer Akti engesell schaf t 

<120> Nucleic acids coding for new acetylcholine receptor beta subunits of 
i nsects 

<130> l_e A 34 147 

<140> 
<141> 

<150> DE 199 59 582.8 
<151> 1999-12-10 

<160> 4 

<170> Patentin ver. 2.1 

<210> 1 
<211> 1539 
<212> DNA 

<213> Drosophila melanogaster 

<220> 
<221> CDS 

<222> (43) . . (1365) 
<400> 1 

attcggcacg agggtacatc cgaaacaaag gcgcgctgaa ca atg acg acg act 54 

Met Thr Thr Thr 
1 



ccc aag ata aag gca cca gtt tec ggt cct gga ctg cca eta ctg ctg 
Pro Lys lie Lys Ala Pro Val Ser Gly Pro Gly Leu Pro Leu Leu Leu 
5 10 15 20 



102 



caa atg eta atg ggg atg ctt ctt atg ggg ctg act tec gtg cca gqc 150 

Gin Met Leu Met Gly Met Leu Leu Met Gly Leu Thr Ser val Pro Gly 

25 30 35 

gee act gec acc gcg gac ccc aag aac gee aat gtc aag gee ctg gat 198 

Ala Thr Ala Thr Ala Asp Pro Lys Asn Ala Asn Val Lys Ala Leu Asp 
40 45 50 

cgc etc cac gee ggc ctg ttc acg aac tac gac age gat gtg cag ccg 246 

Arg Leu His Ala Gly Leu Phe Thr Asn Tyr Asp Ser Asp Val Gin Pro 
55 60 65 

gtg ttc caa gga acc ccc acg aac gtg tec ctg gaa atg gtg gtc acc 294 

val Phe Gin Gly Thr Pro Thr Asn val Ser Leu Glu Met Val val Thr 
70 75 80 

tac ata gac ate gac gag ttg aac gqc aag ctg acc acc cac tgc tgg 342 

Tyr lie Asp lie Asp Glu Leu Asn Gly Lys Leu Thr Thr His Cys Trp 

85 90 95 100 

ctg aat etc cga tgg aga gac gag gag cgc gtg tgg caa ccg tea caa 390 

Leu Asn Leu Arg Trp Arg Asp Glu Glu Arg Val Trp Gin Pro Ser Gin 

105 110 115 

tat gac aac ate acg cag ate act ttg aag tec age gag gtc tgg acc 438 

Tyr Asp Asn lie Thr Gin lie Thr Leu Lys Ser Ser Glu val Trp Thr 
120 125 130 

ccc caa ate aca etc ttc aac ggc gac gaa ggt ggc ctg atg gec gaa 486 

Pro Gin lie Thr Leu Phe Asn Gly Asp Glu Gly Gly Leu Met Ala Glu 
135 140 145 

Page 1 



acc cag gtg acc etc age cac gat ggc cac ttc egg tgg atg cct cca 534 
Thr Gin val Thr Leu Ser His Asp Gly His Phe Arg Trp Met Pro Pro 
150 155 160 

gec gtq tac acg gec tac tgc gaa etc aac atg etc aac tgg ccc cac 582 
Ala val Tyr Thr Ala Tyr cys Glu Leu Asn Met Leu Asn Trp Pro His 
165 170 175 180 

gac aag cag age tgc aag ttg aag ate ggc tec tgg ggc ctg aag gtc 630 
Asp Lys Gin ser Cys Lys Leu Lys lie Gly Ser Trp Gly Leu Lys val 
185 190 195 

gtc ctg ccg gag aac ggc acg gcg aga gga gag tec ctt gac cac gac 
val Leu Pro Glu Asn Gly Thr Ala Arg Gly Glu Ser Leu Asp His Asp 
200 205 210 



ctg ggc acg tgg ctg ctg etc teg gtq ttc age acc act ggc gag teg 
Leu Gly Thr Trp Leu Leu Leu Ser val Phe Ser Thr Thr Gly Glu Ser 
360 365 370 



gac gag cag gag tec agt ccg ctg ggc ate aac cac acc gag gtq ccg 
Asp Glu Gin Glu ser ser Pro Leu Gly lie Asn His Thr Glu val Pro 
390 395 400 



678 



gac ctg gtt cag tea ccg gag tgg gaa ate gtq gac teg cga gee cac 726 

Asp Leu Val Gin Ser Pro Glu Trp Glu lie val Asp Ser Arg Ala His 
215 220 225 

ttt gtc agt cag gac tac tac ggc tac atg gag tac act. ctg acg get 774 

Phe val Ser Gin Asp Tyr Tyr Gly Tyr Met Glu Tyr Thr Leu Thr Ala 

230 235 240 

cag egg cgc tec tec atg tac acg gee gtc ate tac aca ccc gcg tec 822 

Gin Arg Arg Ser Ser Met Tyr Thr Ala val lie Tyr Thr Pro Ala Ser 

245 250 255 260 

tgc ate gtc ate ctg gee etc tea gec ttc tgg ctg cct ccc cac atg 870 

Cys lie Val lie Leu Ala Leu Ser Ala Phe Trp Leu Pro Pro His Met 

265 270 275 

ggc ggc gag aag ate atg ate aac ggc ctg etc ate ate gtq ate gec 918 

Gly Gly Glu Lys lie Met lie Asn Gly Leu Leu lie lie val lie Ala 

280 285 290 

gec ttc etc atg tac ttc gee cag etc ctg cca gtg ctg tec aac aat 966 

Ala Phe Leu Met Tyr Phe Ala Gin Leu Leu Pro val Leu Ser Asn Asn 
295 300 305 

act cca ctt gtg gta ate ttc tac age acc age ctg ctg tat ctg age 1014 

Thr Pro Leu val val lie Phe Tyr ser Thr Ser Leu Leu Tyr Leu Ser 

310 315 320 

gtc tec acc ate gtc gag gtt eta gtt ctg tac ctg gee aca ggc aag 1062 

Val Ser Thr lie val Glu Val Leu val Leu Tyr Leu Ala Thr Gly Lys 

325 330 335 340 

cac aag agg cgc ctg ccg gag gcg ctg aga aag ctg ctg cac ggg cac 1110 

His Lys Arg Arg Leu Pro Glu Ala Leu Arg Lys Leu Leu His Gly His 

345 350 355 



1158 



cag gcg gag aag acc aaa gag atg gac gag cac ccg tac gag gag gcg 1206 
Gin Ala Glu Lys Thr Lys Glu Met Asp Glu His Pro Tyr Glu Glu Ala 
375 380 385 



1254 



ggc gec aag gec aac cag ttc gac tgg gcg ctg ctg gec acc gee gtg 1302 
Gly Ala Lys Ala Asn Gin Phe Asp Trp Ala Leu Leu Ala Thr Ala Val 
405 410 415 420 
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gac cgc att tec ttc gtt tec ttc age ctg gee ttc etc att ctg gee 1350 
Asp Arg lie ser Phe val Ser Phe ser Leu Ala Phe Leu lie Leu Ala 
425 430 435 

ate agg tgc tec gtg tagggatget cgagactcaa ggccacatcc caagecagtg 1405 
lie Arg Cys Ser Val 
440 

cgcactctga actagttttg catttgegat ttcatgtatt taatgtgtgt gcgaacttat 1465 
aattatttaa tgatgagacc tcgtatggaa taaaggacct ctgccgaatg tetgettaca 1525 
aaaaaaaaaa aaaa 1539 

<210> 2 
<211> 441 
<212> PRT 

<213> Drosophila melanogaster 
<400> 2 

Met Thr Thr Thr Pro Lys lie Lys Ala Pro val Ser Gly Pro Gly Leu 
15 10 15 

Pro Leu Leu Leu Gin Met Leu Met Gly Met Leu Leu Met Gly Leu Thr 
20 25 30 

Ser val Pro Gly Ala Thr Ala Thr Ala Asp Pro Lys Asn Ala Asn Val 
35 40 45 

Lys Ala Leu Asp Arg Leu His Ala Gly Leu Phe Thr Asn Tyr Asp Ser 
50 5 5 60 

Asp val Gin Pro val Phe Gin Gly Thr Pro Thr Asn val Ser Leu Glu 
65 70 75 80 

Met Val Val Thr Tyr lie Asp lie Asp Glu Leu Asn Gly Lys Leu Thr 
85 90 95 

Thr His cys Trp Leu Asn Leu Arg Trp Arg Asp Glu Glu Arg val Trp 
100 105 110 

Gin Pro Ser Gin Tyr Asp Asn lie Thr Gin lie Thr Leu Lys Ser Ser 
115 120 12 5 

Glu val Trp Thr Pro Gin lie Thr Leu Phe Asn Gly Asp Glu Gly Gly 
130 13 5 140 

Leu Met Ala Glu Thr Gin val Thr Leu Ser His Asp Gly His Phe Arg 
145 150 155 160 

Trp Met Pro Pro Ala Val Tyr Thr Ala Tyr Cys Glu Leu Asn Met Leu 
165 170 175 

Asn Trp Pro His Asp Lys Gin Ser Cys Lys Leu Lys lie Gly Ser Trp 
180 185 190 

Gly Leu Lys val val Leu Pro Glu Asn Gly Thr Ala Arg Gly Glu Ser 
195 200 205 

Leu Asp His Asp Asp Leu val Gin Ser Pro Glu Trp Glu lie val Asp 
210 215 220 

Ser Arg Ala His Phe Val Ser Gin Asp Tyr Tyr Gly Tyr Met Glu Tyr 
225 230 235 240 

Thr Leu Thr Ala Gin Arg Arg Ser Ser Met Tyr Thr Ala Val lie Tyr 
245 250 255 
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Thr Pro Ala 

Pro Pro His 
275 

lie val lie 
290 

Leu Ser Asn 
305 

Leu Tyr Leu 

Ala Thr Gly 

Leu His Gly 
355 

Thr Gly Glu 
370 

Tyr Glu Glu 
385 

Thr Glu val 
Ala Thr Ala 



Leu lie Leu 
435 



Ser cys lie val 
260 

Met Gly Gly Glu 



Ala Ala Phe Leu 
295 

Asn Thr Pro Leu 
310 

Ser val Ser Thr 
325 

Lys His Lys Arg 
340 

His Leu Gly Thr 



Ser Gin Ala Glu 
375 

Ala Asp Glu Gin 
390 

Pro Gly Ala Lys 
405 

val Asp Arg lie 
420 

Ala lie Arg Cys 



lie Leu Ala 
265 

Lys lie Met 
280 

Met Tyr Phe 

val val lie 

lie val Glu 
330 

Arg Leu Pro 
345 

Trp Leu Leu 
360 

Lys Thr Lys 
Glu Ser Ser 



Ala Asn Gin 
410 

Ser Phe Val 
425 

ser val 
440 



Leu Ser Ala 

lie Asn Gly 
285 

Ala Gin Leu 
300 

Phe Tyr ser 
315 

val Leu val 

Glu Ala Leu 

Leu Ser Val 
365 

Glu Met Asp 
380 

Pro Leu Gly 
395 

Phe Asp Trp 
Ser Phe Ser 



Phe Trp Leu 
270 

Leu Leu lie 

Leu Pro val 

Thr Ser Leu 
320 

Leu Tyr Leu 
335 

Arg Lys Leu 
350 

Phe Ser Thr 
Glu His Pro 



lie Asn His 
400 

Ala Leu Leu 
415 

Leu Ala Phe 
430 



<210> 3 
<211> 20 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Primer 
<400> 3 

tggcarcci t ci cartayga 



<210> 4 
<211> 21 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Primer 
<400> 4 

catratytty tcicciccca t 
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